AITopics | monocular camera

Collaborating Authors

monocular camera

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MonoSpheres: Large-Scale Monocular SLAM-Based UAV Exploration through Perception-Coupled Mapping and Planning

Musil, Tomáš, Petrlík, Matěj, Saska, Martin

arXiv.org Artificial IntelligenceNov-24-2025

Autonomous exploration of unknown environments is a key capability for mobile robots, but it is largely unsolved for robots equipped with only a single monocular camera and no dense range sensors. In this paper, we present a novel approach to monocular vision-based exploration that can safely cover large-scale unstructured indoor and outdoor 3D environments by explicitly accounting for the properties of a sparse monocular SLAM frontend in both mapping and planning. The mapping module solves the problems of sparse depth data, free-space gaps, and large depth uncertainty by oversampling free space in texture-sparse areas and keeping track of obstacle position uncertainty. The planning module handles the added free-space uncertainty through rapid replanning and perception-aware heading control. We further show that frontier-based exploration is possible with sparse monocular depth data when parallax requirements and the possibility of textureless surfaces are taken into account. We evaluate our approach extensively in diverse real-world and simulated environments, including ablation studies. To the best of the authors' knowledge, the proposed method is the first to achieve 3D monocular exploration in real-world unstructured outdoor environments. We open-source our implementation to support future research.

artificial intelligence, exploration, viewpoint, (15 more...)

arXiv.org Artificial Intelligence

2511.17299

Genre: Research Report > Promising Solution (0.34)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

3D Mapping Using a Lightweight and Low-Power Monocular Camera Embedded inside a Gripper of Limbed Climbing Robots

Okawara, Taku, Nishibe, Ryo, Kasano, Mao, Uno, Kentaro, Yoshida, Kazuya

arXiv.org Artificial IntelligenceNov-11-2025

Limbed climbing robots are designed to explore challenging vertical walls, such as the skylights of the Moon and Mars. In such robots, the primary role of a hand-eye camera is to accurately estimate 3D positions of graspable points (i.e., convex terrain surfaces) thanks to its close-up views. While conventional climbing robots often employ RGB-D cameras as hand-eye cameras to facilitate straightforward 3D terrain mapping and graspable point detection, RGB-D cameras are large and consume considerable power. This work presents a 3D terrain mapping system designed for space exploration using limbed climbing robots equipped with a monocular hand-eye camera. Compared to RGB-D cameras, monocular cameras are more lightweight, compact structures, and have lower power consumption. Although monocular SLAM can be used to construct 3D maps, it suffers from scale ambiguity. To address this limitation, we propose a SLAM method that fuses monocular visual constraints with limb forward kinematics. The proposed method jointly estimates time-series gripper poses and the global metric scale of the 3D map based on factor graph optimization. We validate the proposed framework through both physics-based simulations and real-world experiments. The results demonstrate that our framework constructs a metrically scaled 3D terrain map in real-time and enables autonomous grasping of convex terrain surfaces using a monocular hand-eye camera, without relying on RGB-D cameras. Our method contributes to scalable and energy-efficient perception for future space missions involving limbed climbing robots. See the video summary here: https://youtu.be/fMBrrVNKJfc

artificial intelligence, robot, terrain surface, (18 more...)

arXiv.org Artificial Intelligence

2511.05816

Country:

North America > United States > Oklahoma > Payne County > Cushing (0.04)
Asia > Japan > Kyūshū & Okinawa > Kyūshū > Kumamoto Prefecture > Kumamoto (0.04)
Asia > Japan > Honshū > Tōhoku > Miyagi Prefecture > Sendai (0.04)
Asia > Japan > Honshū > Kantō > Ibaraki Prefecture > Tsukuba (0.04)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Robots > Locomotion (1.00)

Add feedback

Aucamp: An Underwater Camera-Based Multi-Robot Platform with Low-Cost, Distributed, and Robust Localization

Xu, Jisheng, Lin, Ding, Fong, Pangkit, Fang, Chongrong, Duan, Xiaoming, He, Jianping

arXiv.org Artificial IntelligenceJun-12-2025

This paper introduces an underwater multi-robot platform, named Aucamp, characterized by cost-effective monocular-camera-based sensing, distributed protocol and robust orientation control for localization. We utilize the clarity feature to measure the distance, present the monocular imaging model, and estimate the position of the target object. We achieve global positioning in our platform by designing a distributed update protocol. The distributed algorithm enables the perception process to simultaneously cover a broader range, and greatly improves the accuracy and robustness of the positioning. Moreover, the explicit dynamics model of the robot in our platform is obtained, based on which, we propose a robust orientation control framework. The control system ensures that the platform maintains a balanced posture for each robot, thereby ensuring the stability of the localization system. The platform can swiftly recover from an forced unstable state to a stable horizontal posture. Additionally, we conduct extensive experiments and application scenarios to evaluate the performance of our platform. The proposed new platform may provide support for extensive marine exploration by underwater sensor networks.

artificial intelligence, platform, robot, (18 more...)

arXiv.org Artificial Intelligence

2506.09876

Country:

Europe (0.67)
North America > United States (0.46)
Asia > China (0.32)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Communications > Networks > Sensor Networks (0.89)

Add feedback

Design of a Formation Control System to Assist Human Operators in Flying a Swarm of Robotic Blimps

Wu, Tianfu, Fu, Jiaqi, Meng, Wugang, Cho, Sungjin, Zhan, Huanzhe, Zhang, Fumin

arXiv.org Artificial IntelligenceMay-15-2025

Formation control is essential for swarm robotics, enabling coordinated behavior in complex environments. In this paper, we introduce a novel formation control system for an indoor blimp swarm using a specialized leader-follower approach enhanced with a dynamic leader-switching mechanism. This strategy allows any blimp to take on the leader role, distributing maneuvering demands across the swarm and enhancing overall formation stability. Only the leader blimp is manually controlled by a human operator, while follower blimps use onboard monocular cameras and a laser altimeter for relative position and altitude estimation. A leader-switching scheme is proposed to assist the human operator to maintain stability of the swarm, especially when a sharp turn is performed. Experimental results confirm that the leader-switching mechanism effectively maintains stable formations and adapts to dynamic indoor environments while assisting human operator.

algorithm, artificial intelligence, blimp, (17 more...)

arXiv.org Artificial Intelligence

2505.09511

Country:

Asia > China > Hong Kong (0.05)
Asia > China > Beijing > Beijing (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
Asia > South Korea (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Transportation > Passenger (1.00)
Transportation > Air (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.95)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.46)

Add feedback

Reinforcement Learning-Based Monocular Vision Approach for Autonomous UAV Landing

Houichime, Tarik, Amrani, Younes EL

arXiv.org Artificial IntelligenceMay-13-2025

This paper introduces an innovative approach for the autonomous landing of Unmanned Aerial Vehicles (UAVs) using only a front-facing monocular camera, therefore obviating the requirement for depth estimation cameras. Drawing on the inherent human estimating process, the proposed method reframes the landing task as an optimization problem. The UAV employs variations in the visual characteristics of a specially designed lenticular circle on the landing pad, where the perceived color and form provide critical information for estimating both altitude and depth. Reinforcement learning algorithms are utilized to approximate the functions governing these estimations, enabling the UAV to ascertain ideal landing settings via training. This method's efficacy is assessed by simulations and experiments, showcasing its potential for robust and accurate autonomous landing without dependence on complex sensor setups. This research contributes to the advancement of cost-effective and efficient UAV landing solutions, paving the way for wider applicability across various fields.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2505.06963

Genre: Research Report > Promising Solution (0.34)

Industry: Aerospace & Defense (0.58)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Unified Human Localization and Trajectory Prediction with Monocular Vision

Luan, Po-Chien, Gao, Yang, Demonsant, Celine, Alahi, Alexandre

arXiv.org Artificial IntelligenceMar-5-2025

Conventional human trajectory prediction models rely on clean curated data, requiring specialized equipment or manual labeling, which is often impractical for robotic applications. The existing predictors tend to overfit to clean observation affecting their robustness when used with noisy inputs. In this work, we propose MonoTransmotion (MT), a Transformer-based framework that uses only a monocular camera to jointly solve localization and prediction tasks. Our framework has two main modules: Bird's Eye View (BEV) localization and trajectory prediction. The BEV localization module estimates the position of a person using 2D human poses, enhanced by a novel directional loss for smoother sequential localizations. The trajectory prediction module predicts future motion from these estimates. We show that by jointly training both tasks with our unified framework, our method is more robust in real-world scenarios made of noisy inputs. We validate our MT network on both curated and non-curated datasets. On the curated dataset, MT achieves around 12% improvement over baseline models on BEV localization and trajectory prediction. On real-world non-curated dataset, experimental results indicate that MT maintains similar performance levels, highlighting its robustness and generalization capability. The code is available at https://github.com/vita-epfl/MonoTransmotion.

localization, prediction, trajectory prediction, (13 more...)

arXiv.org Artificial Intelligence

2503.03535

Country:

South America > Brazil (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Event-Based Adaptive Koopman Framework for Optic Flow-Guided Landing on Moving Platforms

Banday, Bazeela, Sah, Chandan Kumar, Keshavan, Jishnu

arXiv.org Artificial IntelligenceFeb-3-2025

This paper presents an optic flow-guided approach for achieving soft landings by resource-constrained unmanned aerial vehicles (UAVs) on dynamic platforms. An offline data-driven linear model based on Koopman operator theory is developed to describe the underlying (nonlinear) dynamics of optic flow output obtained from a single monocular camera that maps to vehicle acceleration as the control input. Moreover, a novel adaptation scheme within the Koopman framework is introduced online to handle uncertainties such as unknown platform motion and ground effect, which exert a significant influence during the terminal stage of the descent process. Further, to minimize computational overhead, an event-based adaptation trigger is incorporated into an event-driven Model Predictive Control (MPC) strategy to regulate optic flow and track a desired reference. A detailed convergence analysis ensures global convergence of the tracking error to a uniform ultimate bound while ensuring Zeno-free behavior. Simulation results demonstrate the algorithm's robustness and effectiveness in landing on dynamic platforms under ground effect and sensor noise, which compares favorably to non-adaptive event-triggered and time-triggered adaptive schemes.

artificial intelligence, machine learning, platform, (16 more...)

arXiv.org Artificial Intelligence

2501.16868

Genre: Research Report > New Finding (0.88)

Industry:

Aerospace & Defense (0.88)
Energy > Oil & Gas (0.55)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.34)

Add feedback

Pose, Velocity and Landmark Position Estimation Using IMU and Bearing Measurements

Wang, Miaomiao, Tayebi, Abdelhamid

arXiv.org Artificial IntelligenceJul-25-2024

This paper investigates the estimation problem of the pose (orientation and position) and linear velocity of a rigid body, as well as the landmark positions, using an inertial measurement unit (IMU) and a monocular camera. First, we propose a globally exponentially stable (GES) linear time-varying (LTV) observer for the estimation of body-frame landmark positions and velocity, using IMU and monocular bearing measurements. Thereafter, using the gyro measurements, some landmarks known in the inertial frame and the estimates from the LTV observer, we propose a nonlinear pose observer on $\SO(3)\times \mathbb{R}^3$. The overall estimation system is shown to be almost globally asymptotically stable (AGAS) using the notion of almost global input-to-state stability (ISS). Interestingly, we show that with the knowledge (in the inertial frame) of a small number of landmarks, we can recover (under some conditions) the unknown positions (in the inertial frame) of a large number of landmarks. Numerical simulation results are presented to illustrate the performance of the proposed estimation scheme.

estimation, landmark position, observer, (15 more...)

arXiv.org Artificial Intelligence

2407.18099

Country:

North America > Canada > Ontario > Thunder Bay (0.04)
North America > Canada > Ontario > Middlesex County > London (0.04)
Asia > China > Hubei Province > Wuhan (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Robots (0.46)

Add feedback

MCGMapper: Light-Weight Incremental Structure from Motion and Visual Localization With Planar Markers and Camera Groups

Xie, Yusen, Huang, Zhenmin, Chen, Kai, Zhu, Lei, Ma, Jun

arXiv.org Artificial IntelligenceMay-26-2024

Structure from Motion (SfM) and visual localization in indoor texture-less scenes and industrial scenarios present prevalent yet challenging research topics. Existing SfM methods designed for natural scenes typically yield low accuracy or map-building failures due to insufficient robust feature extraction in such settings. Visual markers, with their artificially designed features, can effectively address these issues. Nonetheless, existing marker-assisted SfM methods encounter problems like slow running speed and difficulties in convergence; and also, they are governed by the strong assumption of unique marker size. In this paper, we propose a novel SfM framework that utilizes planar markers and multiple cameras with known extrinsics to capture the surrounding environment and reconstruct the marker map. In our algorithm, the initial poses of markers and cameras are calculated with Perspective-n-Points (PnP) in the front-end, while bundle adjustment methods customized for markers and camera groups are designed in the back-end to optimize the 6-DOF pose directly. Our algorithm facilitates the reconstruction of large scenes with different marker sizes, and its accuracy and speed of map building are shown to surpass existing methods. Our approach is suitable for a wide range of scenarios, including laboratories, basements, warehouses, and other industrial settings. Furthermore, we incorporate representative scenarios into simulations and also supply our datasets with pose labels to address the scarcity of quantitative ground-truth datasets in this research field. The datasets and source code are available on GitHub.

algorithm, camera group, dataset, (12 more...)

arXiv.org Artificial Intelligence

2405.16599

Country:

Asia > China > Hong Kong (0.05)
Asia > China > Guangdong Province > Guangzhou (0.05)
Europe > Spain > Galicia > Madrid (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (0.94)
Information Technology > Artificial Intelligence > Robots (0.69)
Information Technology > Sensing and Signal Processing > Image Processing (0.68)

Add feedback

DisBeaNet: A Deep Neural Network to augment Unmanned Surface Vessels for maritime situational awareness

Vemula, Srikanth, Franco, Eulises, Frye, Michael

arXiv.org Artificial IntelligenceMay-17-2024

Intelligent detection and tracking of the vessels on the sea play a significant role in conducting traffic avoidance in unmanned surface vessels(USV). Current traffic avoidance software relies mainly on Automated Identification System (AIS) and radar to track other vessels to avoid collisions and acts as a typical perception system to detect targets. However, in a contested environment, emitting radar energy also presents the vulnerability to detection by adversaries. Deactivating these Radiofrequency transmitting sources will increase the threat of detection and degrade the USV's ability to monitor shipping traffic in the vicinity. Therefore, an intelligent visual perception system based on an onboard camera with passive sensing capabilities that aims to assist USV in addressing this problem is presented in this paper. This paper will present a novel low-cost vision perception system for detecting and tracking vessels in the maritime environment. This novel low-cost vision perception system is introduced using the deep learning framework. A neural network, DisBeaNet, can detect vessels, track, and estimate the vessel's distance and bearing from the monocular camera. The outputs obtained from this neural network are used to determine the latitude and longitude of the identified vessel.

bearing, disbeanet neural network, neural network, (8 more...)

arXiv.org Artificial Intelligence

2405.06149

Country:

North America > United States > California > Monterey County > Monterey (0.04)
Europe > Greece (0.04)

Genre: Research Report (0.50)

Industry: Government > Military (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback